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I ^ I Abstract 

^ ■ A new test statistic based on success runs of weighted deviations is intro- 

duced. Its use for observations sampled from independent normal distribu- 
tions is worked out in detail. It supplements the classic x^ test which ignores 
the ordering of observations and provides additional sensitivity to local de- 
viations from expectations. The exact distribution of the statistic in the 
non-parametric case is derived and an algorithm to compute p-values is pre- 
sented. The computational complexity of the algorithm is derived employing 
a novel identity for integer partitions. 



^ I Keywords: Success runs, p-value, x^; Integer partitions, Measurements 

^r^ ■ with Gaussian uncertainty 

m : 2000 MSC: 62G10, 05A17, 60C05, 62P35 

(N 

en 

Q . 1. Introduction 

o . 

In the course of scientific inference, we are faced with one basic task: 
comparing observations and model predictions. Based on this comparison, 
the hypothesized model may be either accepted or rejected. In the latter case 



^ ■ usually an improved model is sought. The comparison between observations 

and the new model is then repeated until a satisfactory model has been 
constructed. 

In model validation the goal is to provide quantitative test procedures. 
The standard approach consists of defining a scalar function of the data D, 
called test statistic T{D), such that a large value of T indicates a large 
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deviation of the data from the expectations under the hypothesized model 
"H. Correspondingly, small T is seen as good agreement. Let Tobs denote 
the value of T observed in the actual data set. In order to facilitate the 
interpretation of T (how large is too large?), it is useful to introduce the 
p-value. Assuming Ti, the p-value is defined as the tail area probability to 
randomly sample a value of T larger than Tabs'- 

p = p{T>Tobs\n). (1) 

If T-C is correct and all parameters are fixed, then p is a random variable 
with uniform distribution on [0, 1]. An incorrect model will typically yield 
smaller values of p. This is used to guide model selection. For the same 
data, different models will give different p. Similarly, a different choice of the 
test statistic produces a different p for the same model and data. Why use 
different statistics? Because one statistic is sensitive to certain, but not to 
all properties of the model. 

To illustrate this, recall that in the majority of practical applications the 
hypothesis Ti describing the set of A^ observations D = {Xi} is constructed 
with individual observations Xj G M considered independent. The discrete 
scalar index i provides an ordering for the data. It may represent time, 
length, energy .... For concreteness, let us assume independent, normally 
distributed variables Xi ~ M {^i^af). We can write the probability density 
of the data as 



P{D\H) = \{P{X,\^^..,a^)o,\{ exp - ^^^ f ^ = exp ^ ^^ 



'^<yt I V 2 ^ 

(2) 
where Xat = X]j=i ' i appears naturally; it is the most widely used test 
statistic to probe "H; a large Xat translates directly into a small P {D [H). 
Note that Xn/^ is a measure of the average deviation per observation, but 
it is blind to the ordering of the data points. 

In this paper, we introduce a test statistic sensitive to local deviations of 
the data from expectations within an ordered data set. The test statistic we 
propose is valid for data which are expected to have equal probabilities to be 
below or above expectations. For concreteness, we consider the Xi normally 
distributed with known mean and variance, but the formulation is valid for 
any symmetric distribution. 



Statistics involving runs; i.e. sequences of observations that share a com- 
mon attribute commonly called a success, have drawn a lot of attention. 
Good reviews are presented in |l|, |2|, y] . Most of the early work was centered 
around independent Bernoulli trials; cf. j^, |5| and J6|. After the introduction 
of the Markov chain imbedding approach by [7|, runs statistics have been 
considered also for more complicated models with Markov dependence and 
exchangeable binary trials in |8|, |9|, |lO|, [ll| and 12 . 



In this paper we call an observation a success, S, if the observed value 
exceeds the expected value. Similarly an expected value exceeding the obser- 
vation is considered a failure, F. Obviously the meaning of success and failure 
may be reversed, and without loss of generality we may concentrate on the 
success runs. Using the notation of |3| and counting convention of [l(], the 
simplest test statistics based on runs are the number of runs of length exactly 
k, EN,k, and the length of the longest run, L^. As an example consider the 
realization FSSFS; then E^^i = 1 and L5 = 2. Observe that both E^^k and 
Ljv ignore relevant information: a success is a success no matter how much 
Xi is bigger than its expected value. 

The goal of this paper is to enhance the existing procedures based on 
E^^k or Ln by introducing a new runs statistic T, similar in spirit to L^, 
which includes that extra information. T is formally defined in three steps: 

1. Split the data {Xi} into runs. Denote by Aj = {Xj^,Xj2 . . . } the set 
of observations in the j-th success run. 

2. Associate a weight with each success run. The weight Wj = w {Aj) 
ought to be chosen such that a large weight indicates large discrepancy 
between model and observations. A natural choice of the weight func- 
tion is a convenient one-to-one function of the probability (density) of 
Aj such as Wj = [P {Aj \T-L)]~ or Wj = exp {—P {Aj iTi)). 

3. Choose T as the largest weight: 

T = raaxwj. (3) 

j 

We proceed as follows. In sec. [2] we first derive the general expression 
for p = P (T > Tobs I'H) given a model with independent observations and 
equal probability of success and failure. The formulation is true for arbitrary 
weights. Next we give explicit results in one concrete example of great im- 
portance where Xj ~ A/" {fii, erf) with /ij, af known and Wj = Xn {^j)- For a 
large number of observations, A^ > 80, the evaluation of the exact expressions 



for p turns out to be highly demanding both in terms of computer time and 
memory, as it scales with the number of integer partitions. Thus we present 
a Monte Carlo method that works even for A^ > 1000 and compare exact 
and approximate results. A selection of critical values of T for common con- 
fidence levels is tabulated. The power of T is studied in sec. |3l Compared 
to x^5 tests based on T are superior in detecting departures from Ti. This is 
demonstrated with a specific but commonly arising example - the presence 
of an unexpected localized peak. As final remarks, we discuss generalizations 
of T to non-symmetric uncertainties and composite hypotheses (parameters 
fit) in sec. HI In the appendix we introduce integer partitions in more detail 
and derive the recurrence relation for integer partitions needed to analyze 
the computational complexity required for computing p-values for T. 

2. Runs statistic 

Let us now make the definition of T explicit in the following example. 
The hypothesis "H for the data {Xi} ,i = 1 ... A^ is formulated as: 

1. All observations {X^} are independent. 

2. Each observation is normally distributed, Xj ~ A/" (/ij, af). 

3. Mean /ij and variance af are known. 

We assume that at least one success, Xi > jii for some i G {1, 2, . . . A^}, has 
been observed. The set of observations D = {Xi} is partitioned into subsets 
containing the success and failure runs. Let Aj denote the subset of the 
observations of the jth success run, Aj = {Xj^ , Xj^ . . . } . The weight of the 
jth success run is then taken to be 

^{a,)^xl.,,=j: ^^'T^\ (4) 

I '■ 

where the sum over i is understood to cover all Xi G Aj. The test statistic 
is the largest weight of any run 

T = maxx™„,j-. (5) 

Our goal is to calculate the p- value p = P{T > Tohs\N) = 1 — P{T < Tobs\N). 
Due to the symmetry of the normal distribution, for each observation the 
chance of success is 

P {Xi is a success iTi) = P {Xi > /ij I'H) = -. (6) 



The following analysis up to ( TTB]) is valid for any l-i such that (jS]) holds. 
This symmetric Bernoulli property drastically simplifies the calculation. 

The key idea is that the set of all sequences of successes and failures in 
N Bernoulli trials can be decomposed into equivalence classes, and P{T < 
Tohs\N) can be expressed as an expectation value over inequivalent sequences. 

For our purposes a sequence ^ of length A^ is sufficiently characterized by 
the numbers Ui^. . . Ui denoting the number of success runs of length one, rii, 
of length two, n2 . . . ; we write n (^) = (ni, . . . , un)- Two sequences ^i, ^2 of 
length N are declared equivalent, if they have the same success runs; i.e. 

^1 ~ 6 ^ n (6) = (^1, . . . , n^v) = n (^2) • (7) 

If the last UN-k, ■ ■ ■ , n^ are zero they may be omitted. Reflexivity, sym- 
metry and transitivity of ~ follow immediately. To illustrate definition ([7]), 
consider the following example. 

Let S [F] denote a success [failure], and consider the sequences ^1 = 
SSSFFSFS and ^2 = FSFSSSFS. Both sequences exhibit two success runs 
of length one, ui = 2, and one success run of length three, n^ = 1. Hence 
n {(,1) = (2, 0, 1) = n (^2), and the sequences are equivalent, ,^1 ~ ,^2- 

In order to find all inequivalent sequences that need to be accounted for 
it turns out to be most useful to fix the number of successes, r, and the 
number of runs, M, with joint density P{M,r\N). Thus by the law of total 
probability 

i V Ai max 

P{T<Tobs\N) = J2 Yl PiT<Tobs\M,r,N)-P{M,r\N). (8) 

r=l M=l 

The maximum number of runs, M^ax, for fixed r is determined as follows: 
there can be no more runs than successes, so M < r. On the other hand, the 
runs have to be separated by at least one failure, hence M < N — r + 1. For 
a fixed number of observations, A^, we have M < ["^^^J . It is easily verified 
that the latter condition is implied by the first two, and the constraints are 
summarized as 

Mmax = min (r, AT - r + 1) . (9) 

The joint distribution P{M,r\N) is conveniently expressed as 

P(M,r\N) = -^.RiM\r,N) (10) 



where R (M|r, N) denotes the number of (possibly equivalent) sequences with 
M success runs given r successes in A^ Bernoulli trials. As an example con- 
sider i?(l|2,3) = |{SSF,FSS}| = 2. In fact R{M\r,N) can be calculated 
efficiently by a recursive algorithm, but it will be seen to cancel out so that 
we have no need to compute it. 

With M, r, N fixed, we can decompose P{T < Tobs\M,r,N) into the 
desired average over inequivalent sequences 

P{T < Tobs\M,r,N) = J2p{T< Tobsk)P{7r\M,r, N). (11) 

TV 

The key observation is that the set of inequivalent sequences {vr} C {^} is in 
one-to-one correspondence with the set of integer partitions of r into exactly 
M summands. 

Due to their widespread applicability, the integer partitions have been 



studied extensively: [13| devoted an entire book to the partitions. For an on- 



line overview we refer to [ij] . Efficient algorithms to construct all partitions 



{n} explicitly are well known; e.g. 15|, |l6|. These algorithms scale linearly 



with the number of partitions. We refer to the appendix for more details on 
integer partitions; there we derive the exact number of sequences needed in 
calculating P{T < Tobs\N). It grows asymptotically as O [jfe^^]- 

The probability of one such sequence vr, P{7!'\M,r,N) is just its mul- 
tiplicity, W (it), divided by the total number of elements in {^}, which is 
R{M\r,N). The multiplicity is found by basic urn model considerations as 
the product of the number of ways to shuffie the success runs and the number 
of ways to distribute the failures in between and around the runs. While the 
former is just the multinomial coefficient 

M \ 

the latter is obtained as a binomial coefficient. Given M runs and N — r 
failures, M — 1 failures are needed to separate the runs, and the remaining 
A^ — r — M + 1 failures can be allocated freely into the M + 1 slots surrounding 



the runs. Using Eq. 1 from [l7j we obtain 

M \ fN-r + l\ (A^-r + 1)! 



W 7t) 



ni,...,nNj \ M J {N -r + l-M)\-Ylini\ 

(iV - r + 2 - M)^ 



with the Pochhammer symbol defined for positive integers x, n as 

(x)„ ^T{x + n)/T{x) = {x + n- 1)!/ {x - 1)! (13) 

Using the independence of the observations, the probabihty to observe a 
value of T smaller than a fixed Tobs in an entire sequence is just the product 
of probabilities of having T < Tobs in each individual success run of length /, 
hence we find at once 

P{T < Tobsln) = l[[P{T< Tobs\l)r ■ (14) 

I 

As an example, consider again the sequence SSSFFSFS, with runs distribu- 
tion n = (2, 0, 1), then its contribution reads 

P{T < Tobs\n) = P{T< Tobs\l = if P{T< Tobs\l = 3) . (15) 

As an intermediate result we note 



A^ Mmax 



r=l M=l vr 

-i:EEni^(^<^.i')r- ';;:yn;:;r 

r = l M = l TT I ^ J III !■ 

(16) 
Mmax = min(r, A^ - r + 1) 

flT6|) is useful for generalizations where P (X, is a success |H) = | but the 
individual Xj are not normally distributed, since at this point it is still left 
open what weight to use in order to quantify the discrepancy between the 
model prediction and the observed outcome of individual runs. 

Assuming Xj ~ M {fii,af), it is most natural to use the x^ of each run 
because it corresponds directly to the probability density of the data . The 
additional benefit of this choice is that P {T < Tobs\l) is known exactly, it is 
just the cumulative distribution function of the celebrated x^- distribution 
with / degrees of freedom: 

P(T<T„a)-''-^^0P- (17) 



In other words, it is the regularized incomplete gamma function, comprised 
of the lower incomplete gamma function 



-f{a,x) = / dtr-^e"* (18) 

Jo 

and the complete gamma function 

/•oo 

T{a) = / dtt^-^e-'. (19) 



This is true even though the individual observations in a run are not normally 
distributed, but according to the half-normal distribution, since they are re- 
quired to be successes. In fact, if Xj is a random variable distributed accord- 
ing to a standard normal distribution limited to the domain [oj, bi] , ai,bi & 
M, the sampling distribution of 

X^ + . . . + Xf (20) 

is given by the x^- distribution with / degrees of freedom ( fTTl) , regardless of 
the domains [oj, hi]. The proof follows the traditional lines by transforming 
to spherical coordinates. It is then seen that the angular contributions (de- 
pending on Oj, bi) are removed in the normalization, and the radial behavior 



(independent of Oj, 6j) is the x^- distribution. See, e.g., [l8|, chap. 11] for 
details. 

Now the derivation of the distribution of T is completed, (IT6l) combined 
with ( IT71) give a complete specification that can be implemented in just a few 
lines of code in mathematica [19] . As an example, P{T > Tabs\N = 25) 
is plotted as a function of Tobs in Fig. [TJ Since the number of partitions 
which contribute to P (T < T^bs \N) grows rapidly with N (see appendix for 
details), we have to resort to a Monte Carlo approximation of the p— value 
for N > 80. Note that the Monte Carlo output also serves as a valuable 
cross check with the exact solution for small N. We now briefly describe the 
Monte Carlo algorithm: 

1. Fix a number of experiments, K, and the number of observations, N, 
in each experiment. 

2. Generate K ■ N standard normal variates. 

3. In each of the K experiments, find the largest xlun '^^ ^^J success run. 
This is Tobsj for the experiment j, j = 1 . . .K. Filter out all exper- 
iments that contain no success. Proceed analogously for the failure 
runs. 
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Figure 1: p— value for the runs test statistic T and A^ = 25 observations. The Monte 
Carlo results for successes (green) and failures (red dashed) with K ~ 10000 generated 
experiments are in excellent agreement with the exact results (blue dotted) using (J16p . 



4. Calculate the empirical cumulative distribution function (ECDF) of 
the set {Tohs,j} -=1^ both for successes and failures. The approximate 
p— value is then given as 1— ECDF. 

In Fig. [H the Monte Carlo results for success runs are indicated in green, for 
failures in red and the exact results are drawn in blue. 

For practical use, the critical values of T for three often used confidence 
levels a = 5%, 1%, 0.1% are presented in Table El Note that for fixed a, the 
critical values vary approximately linearly with log N 



T,rit{N\a)r^c-\ogN + b{a) 



(21) 



The slope c appears to be nearly independent of a. In Fig. [2], the following 
parameter values are chosen: 



a = 0.05 

a = 0.01 

a = 0.001 



c = 2.8, 6 = 2.5 
c = 2.9, 6 = 6.1 
c = 3.0, b = 11.6 



(22) 



N 


5 


10 


25 


50 


100 


500 


1000 


a = 0.05 


6.8 


8.8 


11.5 


13.4 


15.3 


19.8 


21.6 


a = 0.01 


10.4 


12.8 


15.7 


17.7 


19.7 


24.4 


25.9 


a = 0.001 


15.5 


18.3 


21.6 


23.8 


25.6 


29.9 


32.0 



Table 1: Critical values of Tobs at the a — 5%, 1%, 0.1% level as a function of A'". Up 
to A^ = 50 these are found from the exact solution. For larger N, the critical values are 
estimated from the Monte Carlo approximation using K = 10^ simulated experiments and 
linear interpolation of p (Tobs) based on the points {Tobsj, P (Tobsj)) , j = ^ ■ ■ ■ K. 
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Figure 2: Critical values of Tofcs at the a — 5%, 1%, 0.1% level. Tcrit scales approximately 
linearly with log A^. The slope is nearly independent of a. 
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3. Example 

Let us discuss an example that frequently arises in high energy physics 
to study the power of significance tests based on T. For comparison, we use 
the classic y^ test statistic. 

Assume an experiment is conducted to observe the quantity y = y{x). 
The uncertainties are modeled as arising from a normal distribution with 
known variance, then for each of the N independent observations 

y,^M{^li,al). (23) 

The purpose of the experiment is to decide whether the currently accepted 
hypothesis % is sufficient to explain the data. The predictions derived from 
1-L are given as 

^^i = f{xi). (24) 

In addition, assume there exists an extension to "H, denoted by Hi, whose 
predictions are 

IJ,i = fix-i)+ g{xi). (25) 

Typically the extra contribution g{x) is significant in a narrow region only. 
For concreteness, we assume it is a localized peak of the Cauchy-Lorentz 
form with location parameter /3 and scale parameter 7 

g{x)=A.(^l+^-^^^ . (26) 

The magnitude of the extra contribution is defined by A. Three cases are 
to be distinguished. For A — )■ 0,7^1 = "H. For fixed confidence level a, tests 
based on T and x^ reject "H with the nominal probability a. 

For A — )■ cxD, "H is rejected with probability 1 for either statistic. In the 
most interesting region, A not too small and not too large, we study the 
rejection power of T and x^ by simulating experiments under "Hi. We then 
analyze the data under Ti and estimate the power as the fraction of times 
the p-value is found in the rejection region defined by the confidence level 
a = 0.05. We simulate an ensemble of 10000 experiments with A^ = 10 draws 
from Til with Xi = i,i = 1 .. .10 and parameters /3 = 5.5,7 = 2 fixed for 
different values of A. Without loss of generality, we choose /(x) = 0, o"j = 1. 
The numerical results are shown in Figure E] as a function of A. The power 
of T equals the power of x^ ioi A = and A 3> 1 as expected. In the 
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Figure 3: Power of statistics T and ^ in rejecting the null hypothesis of normality around 
zero mean at the 5% confidence level. For fixed A, 10000 experiments, each of sample 
size 10, have been generated from a normal distribution with variance one. The mean of 

sample i, i = 1 ... 10 is distributed according to a Cauchy distribution A- 1 1 + ^2 ) 

A sample data set (yl = 1.5) is shown in the inset. The curves show the power as a function 
of the amplitude A. 



intermediate region, the power of T significantly exceeds that of x^- Similar 
results are obtained when keeping A fixed and varying 7 instead. 

Moreover, if we choose a distribution with light tails (e.g. a normal dis- 
tribution) for g[x) instead of the heavy-tailed Cauchy distribution, the qual- 
itative results are unaffected. The power of T is larger than the power of x^ 
for the alternatives "Hi. For medium sized g[x\ the difference can reach up 
to 40%. 

4. Discussion 

We have introduced the test statistic T and calculated its distribution for 
the case of a sequence of independent observations, each following a normal 
distribution with known mean and variance. Implementing the algorithm to 
calculate critical values of T for the various confidence levels is straightfor- 
ward, but the execution time grows rapidly with the number of observations 
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A^. Hence a Monte Carlo scheme is recommended to calculate critical values 
for A^ > 80, yielding results in reasonable time even for A^ > 1000, thus 
covering virtually the whole range of interest relevant to everyday problems. 
We have verified that the Monte Carlo results agree well with exact results 
for small A^. 

We have demonstrated the usefulness of T and recommend its usage for 
hypothesis testing especially against alternatives with additional local peaks. 

The more common problem in data analysis is to consider a composite 
hypothesis: in a first step free parameters of the model are estimated from 
the data ("fit") and in the second step predictions, based on the fitted pa- 
rameters, and observations are compared ("goodness of fit"). With most test 
statistics the effect of fitted parameters on the sampling distribution of the 
statistic is not analytically known. The only notable exception to this rule is 
the x^ statistic: for k parameters extracted from maximizing the likelihood 
of A^ normal observations, the number of degrees of freedom is N — k, instead 
of A^ in case all parameters are known a priori. Unfortunately, this cannot 
be extended to the runs statistic T considered here. However what we can 
do is to simulate data sets using a Monte Carlo approach, and study the 
approximate numerical distribution of T. For the simplest case of a straight 
line and a maximum likelihood fit to 10 data points, the results are shown 
in Fig. m It is evident that p (T) drops to zero much more sharply for the 
fitted data (red=successes, green=failures) than for the exact results with 
no parameters fitted (blue). Accordingly, the critical values for fitted T at 
level a = 5%, 1%, 0.1% are Tcru = 6.0,8.5, 12.4. In general, the qualitative 
effect of fitting parameters but pretending that they were known before the 
data was taken is that the p-value is not distributed uniformly. Instead, its 
distribution is biased towards p = 1, leading to conservative decisions. The 
quantitative effect depends on the number of observations and parameters, 
the maximization condition determining the best fit parameters (likelihood, 
posterior ...) and possibly other effects. 

Through Monte Carlo approximations the use of the runs statistic T can 
be further generalized to the important class of problems involving asymmet- 
ric uncertainties like Binomial or Poisson distributions. All that needs to be 
changed is the weight of individual runs. As a starting point one could de- 
fine T as the smallest probability (density) of any run, T = minj P [Aj \T-L). 
Numerically the distribution of T is then found in analogous fashion to the 
algorithm described in the caption of Fig. |H An implementation of this al- 
gorithm is scheduled to be included in a future release of BAT, the Bayesian 
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Figure 4: Distribution of runs test statistic T with and without fitted parameters. The 
Monte Carlo results for successes (green) and failures (red dashed) are obtained from 
K = 10000 generated experiments. Each data set consists of A^ = 10 data points {xi,yi), 
where the yi are normally distributed around a straight line of unit slope and zero intercept, 
yi ^ JV [fj, = 1 ■ Xi + 0, a'^ = l)- Then a maximum likelihood fit is performed to extract 
the two parameters of a straight line model y = m-x + b (see inset). Finally the predictions 
are calculated from the fitted model, and Tabs is determined for each experiment. With the 
set of 10000 values of Tobs the empirical CDF (ECDF) is computed, and 1 - ECDF (Tabs) 
is plotted. For comparison the exact results (blue dotted) for iV = 10 using Eq. ([16]), pT)) 
are shown. The effect of fitting is that p (T) drops more sharply, hence the critical values 
are pushed towards smaller T; e.g. at the 5% level Tcrit = 6.0 (fit) vs Tcrit — 8.8 (no fit). 



Analysis Toolkit [20|. BAT is a C++ library based on the Markov Chain 
Monte Carlo approach which offers routines for fitting, hmit setting, good- 
ness of fit and more. Using the Metropohs algorithm 21j it is possible to 
simulate the data sets needed for approximate p-value calculations. 



Appendix 

Integer Partitions and Computational Complexity 

We are now interested in the number of sequences, u (N) , 
be taken into account to calculate a p- value for T, P(T > 



which need to 
Tobs\N), using 
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(TTB]) . Put differently, v (N) is the number of terms in the multiple sum 

N min(r,A''— r+1) 
r=l M=l -K 

where ^^ extends over all inequivalent sequences with r successes distributed 
in M runs, see ([7]) and f lTT]) . Since ly [N) determines the number of steps 
needed to calculate the p- value on a computer, knowing the form of the A^- 
dependence aids in ascertaining whether the computer can be expected to 
finish the calculation in reasonable time. In the main result of this section. 
Proposition [1], z/ ( A^) is essentially given by the number of integer partitions. 
To begin with, we introduce the integer partitions and illustrate with an 
example. The book [l3f| by Andrews is a good reference devoted entirely to 
partitions. 

Definition 1. Let Part (N) denote the number of partitions of the integer 
A^ into a sum of one or more positive integers. For consistency it is useful to 
define Part (0) = 1. Let Part (N, k) denote the number of partitions of A^ into 
exactly k addends and finally let Part< (N, i) denote number of partitions of 
A^ into integers of at most size i, with Part< (0, i) = L 

Example 1. The integer 5 can be written in Part (5) = 7 different ways: 



5 = 5 (A.2) 

= 4 + 1 (A.3) 

= 3 + 2 (A.4) 

= 3 + 1 + 1 (A.5) 

= 2 + 2 + 1 (A.6) 

= 2 + 1 + 1 + 1 (A.7) 

= 1 + 1 + 1 + 1 + 1 (A.8) 

One can see that 5 can be decomposed as a sum of exactly three non-zero 
integers in two ways (Eq. flA.5l) and flA.6l) ). thus Part (5, 3) = 2. Furthermore, 
the number of ways to partition 5 into addends less than 3 is Part< (5, 2) = 3 

(Eq. (jXeD-dOD). 

The three partition numbers just defined are obviously closely connected, 
we shall need the following relations; elementary proofs based on Ferrer's 
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diagrams can be found in the books by Andrews 13|, chap. 1] and Knuth 



15|, chap. 7.2.1.4]. 



Fact 1. Assuming N > 1, Def. U\ yields: 



N 



Part (N) = Y^ Part (N, r) (A.9) 



r=l 



Part< (N, r) = ^ Part (N, M) (A.IO) 



Af=l 



Part< (M, r - M) = Part (r, r - M) (A.ll) 



7V-1 



Part (N) = Y^ Part< (r, N - r) (A. 12) 

r=0 

Proposition 1. Let v {N) denote the number of inequivalent Bernoulli se- 
quences of length N, where the probability of a success is \ in each trial and 
the equivalence relation is defined in (171). Then 

N min{r, Af-r+1) 

v{N) = Y 5Z Part(r,M) (A. 13) 

r=l M=l 

= Part(N + l)-l (A. 14) 

Proof. We start from the right hand side of the proposition using flA.121) : 

N 

Part (N + 1) -1 = -1 + ^ Part< (r, N + 1 - r) (A.15) 

r=0 

N 

= ^Part<(r, N + l-r). (A. 16) 

r=l 

Now using flA.ini) : 

N N-r+l 

Part (N + 1) -1 = ^ Y Part(r, M). (A.17) 



r=l M=l 



But we know that we cannot partition r successes into more than r runs, so 
Part (r, M > r) = 0, hence 
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N min(r, N—r+1) 

Part(N + l)-l = Y^ Y^ Part(r, M) (A.18) 

r=l M=l 

= u{N). (A.19) 

D 

Since Part (r, M) represents the number of elements in ^^ of ( lA.ll) . u (N) 
is the exact number of sequences that contribute to P(T > Tobs\N). We can 
approximate v (N) by employing the asymptotic expression of Part (N) for 



large A^ first derived by Hardy and Ramanujan [22 



exp (71^/2/3 -N 



Part (N) ^ — ^ ^. (A.20) 

Hence, for large A^, v (N) grows nearly exponentially. 
Corollary 1. For large N, v {N) is approximately given by 

exp fyr^S/S ■ (A^ + 1)) 
z/ (A^) ^ — ^ ^ (A.21) 

This implies that for large A^ (say A^ = 1000), in equations ([8]), flTTl) the 
sum is over more partitions {v {N) = 2.5 x 10'^^ ~ 2^^^) than a current 64- 
bit desktop computer could even address in memory. In practice the exact 
evaluation of P(T > Tobs\N) becomes too slow already for A^ > 80 where 
z/ (80) = 1.8 X 10^. In contrast a Monte Carlo solution based on sampling a 
large number of batches, K, each with A^ pseudo random numbers is much 
faster: its computational complexity is O {K ■ N). 
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